Enhancing the ability to tell “where” acute malnutrition is of concern

Results from a quest using probaility proportional to the size of the populaton-based survey data from Karamoja, Uganda and North Darfur, Sudan

Tomás Zaba

2025-03-27

Introduction

Why Spatial Analysis?


  • To provide actionable information on where.

  • Currently, when an analysis is done, the whole polygon is classified as one, hidding the spatial variation of acute malnutriton.

  • Currently protocols try to address this need by:

    • Guiding users to disaggregate surveys when DEFF is >= 1.3 (heterogenous distribution across surveyed areas)
    • This is includes during the FRC reviews
  • However, still not effective and efficient in providing actionable information to the IPC end-users.

  • It does not inform targetting (of resources and reach of those of uptmost need of treatment)

We need a spatial dimension…

…Why?

  • Our analysis inform life-saving interventions, particularly in high-vulnerable countries;
  • Countries struggle to identify the most in need population that should be served first.
    • Different strategies are adopted to define within district/county/etc targetting.
    • These are based on empirical knowledge, not evidence-based.

A few examples:

Somalia: The Operational Priority Areas (OPA)

  • Step 1: ranking of districts with the highest IPC Phase to lowest
  • Step 2: withing district targeting is further based on two criteria: (i) most common vulnerable livelihood zones; (ii) high population density zones.

Mozambique: farthest communities are most vulnerable

  • Targeting consists in selecting the farthest communities away from the center of the district, on the assumption that they are underserved with basic services, therefore most vulnerable.

So what…?

  • …while the strategies are logical, is not always true and predictable due to the complexity and wide-ranging factors that leads to acute malnutrition.

  • Enhancing the ability to tell where acute malnutrition is of concern in IPC would enhance its crucial contribution in providing information to save lives.

On this regard, I conducted an operational research that consisted in:

  • Predict the prevalence of acute malnutrition to unsurveyed/sampled localitions based on the use of data from sampled locations -> spatial interpolation.

Based on the first law of geography and of spatial epidemiology:

“Nearby things are more similar than distant things.”

Toble W. (1970)


We do apply this law in our protocols -> protocols for similar areas

Questions 🧐

  1. Does spatial interpolation produce reliable (precise and accurate) estimates using small scale survey data, such as district level surveys?


  1. How comparable the predicted estimates can be against the observed prevalence estimates of the original survey results?

Data & Preparation

Data Source

Two exampe datasets used:


  1. Nine district-level SMART surveys conducted in 9 districts of Karamoja Region, Uganda.
    • Data collected in April 2021


  1. Locality-level SMART surveys conducted in North Darfur, Sudan
    • Data collected in October 2024

Data Wrangling

Aspatial

flowchart TD

A(WFHZ)
B(MUAC)
C(Exclude rows with missing GPS coord.)
D(Calculate WFHZ and define AMN)
E(Remove outliers)
F(Calculate MFAZ and define AMN)
G(Remove outliers)
H(Get % aggregated at cluster ID)

A --> C --> D --> E --> H
B --> C --> F --> G --> H

Spatial

flowchart TD

A(Set CRS and/reproject CRS)
B(Get mean GPS coord. by cluster ID)
C(Calculate spatial weights)
D(Smooth rates)
E(Krige)

A --> B --> C --> D --> E

Assessment of Model-fit

Cross-validation: leave-one-out resampling method

source: ArcGIS Pro


How does it work?

  • After estimating the interpolation model from all blue points, the value of the red point is hidden, and the remaining points are used to predict the value of the hidden point. The prediction is then compared to the measured value. This process repeats for all 10 points.

Results

Spatial Variation of GAM by WFHZ

Survey sampling points

Predicted surface map

Choropleth map: County

Choropleth map: District

Predicted Estimates of GAM by WFHZ

Observed district prevalence estimates vs predicted prevalence
district Observed prevalence (%) Predicted prevalence (%) bias Minimum prevalence (%) Maximum prevalence (%) Median prevalence (%)
Abim 6.28 9.46 3.19 0.60 11.31 6.52
Amudat 10.04 12.39 2.34 3.14 15.23 8.80
Kaabong 18.07 6.50 -11.57 0.93 29.76 13.42
Karenga 8.19 4.44 -3.75 0.31 22.31 6.19
Kotido 8.01 7.76 -0.25 0.93 15.69 6.86
Moroto 11.85 11.59 -0.26 0.00 28.15 10.56
Nabilatuk 7.50 5.02 -2.48 0.00 23.04 9.21
Nakapiripirit 7.26 8.47 1.22 0.00 18.41 8.77
Napak 7.77 8.46 0.69 0.00 18.84 7.98

Did the Model Fit the Data?

Predicted rates in the cross-validation results against the observed rates


R² = 0.806

  • Positive and strong correlation

Spatial Variation of GAM by MUAC

Survey sampling points

Predicted surface map

Choropleth map: County

Choropleth map: District

Predicted Estimates of GAM by MUAC

Observed district prevalence estimates vs predicted prevalence
district Observed prevalence (%) Predicted prevalence (%) bias Minimum prevalence (%) Maximum prevalence (%) Median prevalence (%)
Abim 4.20 2.41 -1.79 0.00 22.77 4.25
Amudat 2.57 1.42 -1.15 0.00 13.13 2.55
Kaabong 22.36 15.78 -6.58 5.18 33.61 19.33
Karenga 10.03 11.93 1.90 2.56 22.39 9.92
Kotido 15.61 17.49 1.88 3.27 32.75 17.75
Moroto 14.76 16.36 1.60 0.33 27.16 15.63
Nabilatuk 10.59 11.75 1.16 0.51 27.16 11.23
Nakapiripirit 12.62 12.76 0.15 1.54 18.39 12.82
Napak 8.30 10.20 1.89 3.99 28.40 9.07

Did the Model fit the Data?

R² = 0.836

  • Positive and strong correlation

Uncertainty

What influences high uncertainty?

Standardized Prediction Standard Errors

\(Zscore = \frac{\text{Prediction} - \text{Observed Value}}{\text{Kriging Standard Errors}}\)

GAM by WFHZ

GAM by MUAC

Interpretation

  • Z = 0: prediction is exactly equal to the observed value
  • Positive Z: prediction is higher to the observed value.
  • Negative Z: prediction is lower to the observed value.

It basically tells how many standard deviations away the predicted value is from the observed value.

  • ‘> -3 Z < -3’ 👍

Limitations


  • While the mean predicted prevalence shows to be concise and consistent between the district and regional prediction models, as well as in most of times nearly equal the observed prevalence in the original survey

Actionable Insights


  • Based on the results, spatial interpolation using PPS-based survey data (e.g., SMART) appears to generate reliable estimates for decision-making.
    • However, this could be due to chance. Further validation with additional data is necessary.

Actionable Insights for Standard IPC analyses


  • Results appear/may be a better solution for looking into/highlighting hotspots and inform programme targeting.

  • Predicted results at a lower administrative level could be a breakthrough to estimating children in need of treament using district/county/locality-specific prevalence, when surveys are done at higher administrative level - the case of Somalia where surveys are done a livelihood zones.

  • Results could also be a better solution than the current approach in the IPC AMN protocols where surveys should be disaggregated when DEFF >1.3 with 5 clusters and at least 100 obs.

Actionable Insights for FRC Reviews


Results appear/may be a better solution for looking into/highlighting hotspots and inform programme targeting.

  • Possible advantages
    • Affected countries would be able to tell where to prioritize/target.
  • Possible disadvantages
    • I do not see relevant compared to the advantages.

Actionable Insights for Risk Analysis


By highlithing areas that more affected than other:

  • Will ensure to tell clearly areas that are on the brink of crossing IPC AMN Phase 5 thresholds, hence increased monitoring of the risk factors.

Next steps


The approach need to be validated with more data.

  • SMART or othe representative survey data.
    • Can be district/county/locality-specific survey or
    • Regional/province/higher admin level survey and interpolate to lower admin levels.
    • Can also try to model using South Sudan FSNMS data where few (9) clusters are sampled in each county and then aggregated at State or domain level for analysis.
  • Sentinel sites data:
    • Available options:
      • Kenya NDMA sentinel site data: they collect GPS coordinates. Access to data must be requested to the Kenya NDMA authority.